Audio–Visual Segmentation

نویسندگان

چکیده

We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is output pixel-level map of object(s) that produce sound at time image frame. To facilitate this research, we construct first benchmark (AVSBench), providing pixel-wise annotations for sounding objects audible videos. Two settings are studied with benchmark: 1) semi-supervised single source and 2) fully-supervised multiple sources. deal AVS problem, method uses temporal interaction module inject audio semantics as guidance visual process. also design regularization loss encourage mapping during training. Quantitative qualitative experiments on AVSBench compare our approach several existing methods from related tasks, demonstrating proposed promising building bridge between semantics. Code available https://github.com/OpenNLPLab/AVSBench .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Topic Segmentation in Audiovisual Information Retrieval

Segmentation into topically coherent segments is one of the crucial points in information retrieval (IR). Suitable segmentation may improve the results of IR system and help users to find relevant passages faster. Segmentation is especially important in audiovisual recordings, in which the navigation is difficult. We present several methods used for topic segmentation, based on textual, audio a...

متن کامل

Audio content analysis for online audiovisual data segmentation and classification

While current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based on audio content analysis is proposed. The audio signal from movies or TV programs is segmented and class...

متن کامل

Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation

In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, a state of the art Semi Adaptive Appearance Model (SAAM) approach developed by the authors is used for automatic lip tracking, and an adapted version of our vowel based speech segmentation system is employed to automati...

متن کامل

Plundermatics: Real-time Interactive Media Segmentation for Audiovisual Analysis, Composition and Performance

This paper presents methods for real-time automated media segmentation, interactive audiovisual analysis, and media search in composition and performance tasks. In addition, we detail a use case where these tools have been deployed successfully as part of high profile public, national broadcast events, installations and exhibitions. These tools utilise a combination of data-mining and informati...

متن کامل

Segmentation and Annotation of Audiovisual Recordings Based on Automated Speech Recognition

Searching multimedia data in particular audiovisual data is still a challenging task to fulfill. The number of digital video recordings has increased dramatically as recording technology has become more affordable and network infrastructure has become easy enough to provide download and streaming solutions. But, the accessibility and traceability of its content for further use is still rather l...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-19836-6_22